A Comparison of the Performance of SaP::GPU and Intel’s Math Kernel Library (MKL) for Solving Dense Banded Linear Systems
نویسندگان
چکیده
SaP::GPU is a solver developed in the Simulation Based Engineering Lab (SBEL) [1] to solve large banded and sparse linear systems on the GPU. This report contributes the performance comparison of the banded solver of SaP::GPU and Intel’s Math Kernel Library [2] on a large set of synthetic problems. The results of several numerical experiments indicate that when it is used in conjunction with large dense banded matrices, SaP::GPU is two to five times faster than the latest version of the MKL dense solver when the latter is run on the Haswell, Ivy Bridge, or Phi architectures.
منابع مشابه
Analysis of A Splitting Approach for the Parallel Solution of Linear Systems on GPU Cards
We discuss an approach for solving sparse or dense banded linear systems Ax = b on a graphics processing unit (GPU) card. The matrix A ∈ RN×N is possibly nonsymmetric and moderately large, i.e., 10, 000 ≤ N ≤ 500, 000. The split and parallelize (SaP) approach seeks to partition the matrix A into diagonal subblocks Ai, i = 1, . . . , P , which are independently factored in parallel. The solution...
متن کاملSPIKE::GPU A SPIKE-based preconditioned GPU Solver for Sparse Linear Systems
This contribution outlines an approach that draws on general purpose graphics processing unit (GPGPU) computing to solve large linear systems. To methodology proposed relies on a SPIKE-based preconditioner with a Krylov-subspace method and has the following three stages: (i) row/column reordering for boosting diagonal dominance and reducing bandwidth; (ii) applying single precision truncated SP...
متن کاملToward a High Performance Tile Divide and Conquer Algorithm for the Dense Symmetric Eigenvalue Problem
Classical solvers for the dense symmetric eigenvalue problem suffer from the first step involving a reduction to tridiagonal form that is dominated by the cost of accessing memory during the panel factorization. The solution is to reduce the matrix to a banded form, which then requires the eigenvalues of the banded matrix to be computed. The standard D&C algorithm can be modified for this purpo...
متن کاملOn the performance and energy efficiency of sparse linear algebra on GPUs
In this paper we unveil some performance and energy efficiency frontiers for sparse computations on GPU-based supercomputers. We compare the resource efficiency of different sparse matrix–vector products (SpMV) taken from libraries such as cuSPARSE and MAGMA for GPU and Intel’s MKL for multicore CPUs, and develop a GPU sparse matrix–matrix product (SpMM) implementation that handles the simultan...
متن کاملA scalable approach to solving dense linear algebra problems on hybrid CPU-GPU systems
Aiming to fully exploit the computing power of all CPUs and all GPUs on hybrid CPU-GPU systems to solve dense linear algebra problems, we design a class of heterogeneous tile algorithms to maximize the degree of parallelism, to minimize the communication volume, as well as to accommodate the heterogeneity between CPUs and GPUs. The new heterogeneous tile algorithms are executed upon our decentr...
متن کامل